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Section  1 

BACKGROUND  AND  PROBLEM  STATEMENT 

1. 1 Introduction  and  background 

The  research  performed  under  this  grant  (DAAG  29-77-G-0093) 
is  a continuation  of  the  work  done  by  the  author  while  partici- 
pating in  the  Laboratory  Research  Cooperative  Program  (September 
1975  - May  1976) , Task  Order  76-20,  Basic  Agreement  DAHC04  72-A-OOOl. 
The  contents  of  this  report  should  thus  be  considered  as  a comple- 
ment to  the  final  report  (VandeLinde [1] ) for  the  work  just  mentioned 
above.  In  the  rest  of  this  introduction,  we  give  a brief  descrip-- 
tion  of  the  problem  considered  in  [1],  Next,  in  1.2,  we  state  the 
results  obtained  therein,  and  finally,  in  1.3,  we  describe  the 
specific  task  undertaken  in  our  research  (i.e.,  under  grant  #DAAG 
29-77-G-0093) . 

The  following  identification  problem  was  considered  in  [1] : 

Let  0 be  a vector  of  unknown  parameters  [a-,...  a b,  ...b  ] of  a 

1 p 1 p 

linear,  single-input  single-output  system  modeled  in  discrete  time 
by 

y (n) +a.y (n-l)  + . . . .+a  y (n-p) =b, u (n-1)  + . . . . +b  u(n-p)  (1  1) 

X p X p 

z(n)=y (n)+w(n)  (1.2) 

where  the  following  conditions  are  assumed  to  hold: 

Assumption  1 The  linear  system  is  stationary  and  the  order  p 
is  known; 

Assumption  2 The  system  inputs  {u(n)}  are  free  from  observation 

errors  and  their  statistical  characteristics  are  completely  known; 

Assumption  3 The  polynomials  A(x)  = l+a^x+. . . . +apX^  and 

B(x)  = b, +b, x+ . . . . +b  x^  have  no  common  factors; 

11  p 
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Assumption  4 The  inputs  {u(n)}  and  the  measurement  noise  {w(n) } | 

are  both  zero-mean,  finite-variance,  ergodic  sequences  independent  | 

of  each  other;  | 

Assumption  5 The  probability  distribution  F of  w(n)  is  j 

given  by 

F = (l-e)K+eC  (1.3) 

where  K-*-  a symmetric,  zero-mean,  finite  variance  distribution  i 

which  is  completely  specified 
C-*-  a distribution  which  is  only  specified  as  belonging 
to  , the  class  of  all  zero-mean,  finite  variance, 

symmetric  distributions 

f*  a fixed  number,  0_<e<l  , which  may  not  be  known. 

Typically,  however,  e<0.2.  As  C varies  over  /(1.3) 

defines  a convex  set  P for  possible  F . 

Using  only  the  measured  inputs  {u(n)}  and  measured  outputs 
{z(n)}  , the  IDENTIFICATION  TASK  is  to  construct  an  on-line 
identification  scheme  whose  performance  --  as  measured  by  the 
variance  (or  root-mean-square  error)  of  estimation  and  speed  of 
convergence  — is  more  or  less  uniform,  i.e.  robust,  over  different 
distributions  for  w(n)  . Next,  in  1.2,  we  briefly  describe  the 
results  obtained  in  [1]  for  this  (ROBUST)  IDENTIFICATION  TASK 

1 

(R.I.T.).  i 

In  closing,  we  mention  here  that  the  estimation  of  6 
under  Assumptions  1-4  only  has  been  the  most  widely  studied  problem 
in  system  identification  (Astrom  and  Eykhoff  12] , Eykhoff  [3]  and  ^ 

Box  and  Jenkins  [4]).  In  all  such  studies,  though,  the  only  i 

concern  has  been  to  estimate  the  parameters  with  no  consideration 
of  desensitizing  identification  performance  to  the  distribution  of 

J 
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w(n)  . Assumption  5 was  added  in  the  problem  formulation  in  [11 
for  two  reasons.  Firstly,  it  was  deemed  reasonable  to  assume 
some,  but  not  complete,  knowledge  of  the  operating  environment. 
Additionally,  since  any  parameter  estimation  scheme  is  based  on 
the  assumed  model  for  w(n)  , such  a model  (Assumption  5) , which 
allows  for  variations  in  the  characteristics  of  w(n)  , is  clearly 
necessary  to  carry  out  the  task  of  constructing  an  identification 
procedure  that  works  more  or  less  uniformly  well  in  a variety  of 
situations. 

1. 2 Previous  results 

The  approach  in  [1]  for  the  stated  R.I.T.  is  now  described. 

A review  of  comparative  evaluations  of  the  existing  identification 
methods  (Isermann,  Baur,  Bamberger,  Kneppo  and  Siebert  [5],  Saridis  [6] 
and  Sinha  and  Sen  [7] ) established  that  the  method  of  correlation — 
henceforth  referred  to  as  CR  — has  the  best  performance  as  an 
identification  scheme  for  the  parameter  estimation  of  systems  of 
(1.1)  and  (1.2)  under  Assumption  1-4.  Additionally  and  more  impor- 
tantly, while  the  {w(n)}  used  in  [5], [6]  and  [7]  were  quite 
different,  CR  was  uniformly  the  best  method  in  each  study.  Given 
these  facts,  the  reasoning  in  (11  was  straightforward  — with  the 
addition  of  Assumption  5,  CR  could  still  be  expected  to  have  the 
best  performance  for  a particular  F , although  for  different  F 
in  P , the  performance  of  CR  itself  would  vary.  Thus,  desensitiz- 
ing the  performance  of  CR  to  the  distribution  of  w(n)  without 
incurring  a loss  of  its  existing  advantages  would  provide  a 
solution  to  the  ROBUST  IDENTIFICATION  TASK.  Before  describing  the 

I 

modification  of  CR  done  in  [1] , we  briefly  recapitulate  the 
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With  the  assumption  of  ergodicity,  we  have  the  autocorrelation 
of  the  input  given  by 

N 

(1.4) 


R (m)  = E [u  (n)  u (n-m)  ] = lim  ^ u(n)u(n-m) 

n+1  n=0 


and  the  crosscorrelation  between  the  output  and  the  input  given  by 


N 


R (m)  = E [z  (n)  Vi  (n-m)  ] = lim  1 ^ z(n)u(n-m) 

N->«>  N+1  n=0 


(1.5) 


Since  E [w (n) u (m) ] =0  for  all  n,m,  we  also  have 


Rzu("')  = E [z  (n)  u(n-m)  ] 


= E[y(n)u(n-m)  ]+E[w(n)u(n-m)  ] 
= E [y  (n)  u (n-m)  ] 


(1.6) 


- Ryu<'"> 


The  convolution  equation 


R (m)  = y h R (m-n) 
yu  ^ 11” 


n=0 


n uu 


(1.7) 


relates  R^^  to  R^^  through  the  impulse  response  of  the 


system.  If  the  input  autocorrelation  is  known,  then  (1.7)  provides 


estimates  of  the  impulse  response  from  estimates  R^^ (m)  of  the 


crosscorrelation  since  R^^ (m) =R^^ (m)  from  (1.6).  In  particular, 


using  a white  {u(n)}  , for  which  R^^(n)=0  for  all  n^^O  , the 


{h^}  are  simply  obtained  from 


h,.  = 


R (k) 
zu 


k R (0) 
uu 


(1.8) 


The  parameters  [a,.... a b,....b  ] can  then  be  estimated  from  the 

1 P 1 P 


{hj^}  by  a least  squares  procedure  ---  see  for  instance,  p.32. 


Section  1 of  [1] . 

The  modification  of  CR  in  [1]  was  a non-linear  estimation  of 
the  crosscorrelation  instead  of  the  usual  linear  scheme.  We  give 
below  the  general  non-linear  form,  of  which  the  linear  scheme  is 
just  a special  case  with  g(‘)  as  the  identity  function: 
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-A{n)  g (m,  n-1)  -a  (n)  u (n-m)  ] 


with  n=0 ,1,2,... 
A(n)=l/n+l 


R^u(”>.-1)=0 

and 

g{x)=s^x  lx|<d^ 

=Sj^d^sgn  x £ 1^1 '^'^2 

=82 (x  - H sgn  x)  d2£|x|<H 

=0  |x|2iH 


s d -s  d 
H = LJl) 


Sj^>0,  S2£0  ,d2>dj^>0 


(1.10) 


(s , S2  ,d^  ,d2)  were  chosen  as  functions  of  the  parameters  of  K , 

the  specified  part  of  F . As  with  the  unmodified  CR, 

/\ 

[a..... a bT....b  ] were  then  estimated  from  the  {R  (in)}  . 

1 p 1 p zu 

Extensive  computer  simulations  were  carried  out  in  [1] , 
with  20  different  distributions  for  w(n)  , seven  sets  of  values 
for  the  parameters  (s^^ , S2  ,d^  ,d2)  and  three  linear  processes  to 
study  the  performance  of  (the  modified)  method  of  correlation  for 
different  w(n)  using  these  non-linear  estimators  of  ^^u  ' 

One  of  the  seven  sets  of  values  of  (s^^,  S2,dj^,d2)  corresponded  to 
the  linear  estimator  (i.e.  g(x)=x  for  all  x) , which  is  the  common 
method  of  estimating  crosscorrelations  on-line.  The  result  of  [1] 
showed  quite  clearly  that  identification  of  [a^....ap  b^....bp] 
using  the  non-linear  estimates  of  R was  indeed  'robust'  in  the 
sense  of  better  and  more  uniform  performance  than  CR  for  such 


j 
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varying  w(n)  . However,  some  important  work  remained  to  be  done. 
We  next  describe  the  specific  task  which  arises  naturally  from 
the  work  of  [1]  and  which  we  have  undertaken  in  our  research. 

1. 3 Problem  statement 

There  is  a primary  need,  to  show  for  any  estimation  scheme, 
that  the  procedure  is  consistent  in  estimating  the  unknown  para- 
meters. Although  the  simulations  in  [1]  did  not  yield  any  diver- 
gent results,  the  consistency  of  the  non-linear  estimators  of 
(given  by  (1.9))  were  not  analytically  shown.  This  is  the 
specific  task  undertaken  in  this  research:  to  (analytically) 
demonstrate  the  consistency  of  the  'robust'  estimators  of  R 
as  defined  by  equation  (1.9). 
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Section  2 

CONSISTENCY  OF  SOME  ROBUST  ESTIMATORS 
OF  CROSSCORRELATION 


2 . 1 Notation,  assumptions  and  preliminary  results 

For  notational  convenience,  we  re-write  equation  (1.9)  as 
X (n) =x (n-1) -A (n) g [x (n-1) -z (n) u (n-m) ] (2.1) 

with  n=0, 1, 2, . . . 
x(-l)=0 


A(n)=l/n+l 

and  g(‘)  is  defined  by  (1.10) 

yv 

Clearly,  x(n)  represents  R (m,n)  . Note  that  m can  be 

z u 

supressed  in  the  index  of  x , since  it  is  fixed  for  a set  of 
iterations. 

To  prove  the  consistency  of  (2.1),  we  need  two  additional 
conditions  (besides  Assumptions  1-5) : 

Assumption  6 {u(n)}  is  an  independent,  binary  sequence; 

Assumption  7 g(*)  is  the  so-called  'soft-limiter'  function, 

defined  as 


(2.2) 


g(x)  =.  X |x|  < d, 

' djsgn  ^ 

Clearly,  equation  (2.2)  is  a special  case  of  equation  (1.10)  with 


‘1 

|x|  > d. 


Si=l,S2=0,d2=® 


As  before,  d^  is  chosen  as  a function  of  para- 


meters of  K , the  known  part  of  the  distribution  for  w(n)  . 
Remark:  Since  the  method  of  correlation  is  most  often  implemented 


during  normal  operation  with  an  independent,  binary  sequence  as  the 
common  choice  of  test  signal  (Sage  and  Melsa  [8]),  Assumption  6 is 
not  an  unrealistic  condition. 
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Our  method  of  proof  needs,  not  surprisingly,  results  regarding 
properties  of  the  probability  distribution  of  z(n)u(n-m)  for 
systems  of  (1.1)  and  (1.2)  under  Assumptions  1-7.  We  consider 
first  the  more  general  case  with  Assumption  6 replaced  by  Assump- 
tion 6-a : 

Assumption  6-a  {u(n)}  is  an  identically  distributed,  independent 

sequence  with  a zero  mean  symmetric  distribution  (not  necessarily 
binary) . 

Then  for  zero  initial  conditions  and  u(n)=0  for  n<0  , we 
may  write 

n-1 

y(n)  = I h .u(j)  (2.3) 

j=0 

for  the  natural,  undisturbed  output  of  a system  defined  by  equation 
(1.1).  Thus, 

n-1 

z(n)u(n-m)  = I I h .u ( j ) +w (n)  ] u (n-m) 
j=0 

n-1  2 

= y h .u(j)u(n-m)+w(n)u(n-m)+h  u^(n-m)  (2.4) 

j=0  "" 

(n-m) 

Term  I Term  II  Term  III 

We  now  state  our  first  result: 

Lemma  1 : If  {u(n)}  is  independent,  zero-mean  and  symmetrically 

distributed,  the  distribution  of  z(n)u(n-m)  given  u(n-m)  is 

2 

symmetric  around  its  mean  of  h^u  (n-m)  . 

We  write  [A|B]  to  denote  a random  variable  A given  a 


random  variable  B. 
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Proof ; The  distribution  of  [z (n) u (n-m) ] u (n-m) 1 is  simply  the 
distribution  of  z{n)  scaled  by  a constant  (=  u(n-m))  . So  we 
consider  equation  (2.4)  with  u(n-m)  now  a constant. 

Term  I now  is  just  a sum  of  zero-mean,  independent  random 
variables  with  (identical)'  symmetric  distributions,  and  so  is 
itself  a random  variable  with  a zero-mean  and  a symmetric  distri- 
bution (Papoulis  [9]).  Term  II  is  also  a zero-mean,  symmetrically 
distributed  random  variable,  since  w(n)  is  such  and  the  mean 
and  symmetry  (of  a random  variable)  are  unaffected  under  multipli- 
cation by  a constant.  Also,  Term  II  is  independent  of  Term  I. 

Term  III,  however,  is  a constant,  since  u(n-m)  is  given, 
and  not  a random  variable. 

The  stated  result  (Lemma  1)  immediately  follows. 

Remark:  Lemma  1 states  that 

E [z (n) u (n-m) | u (n-m) ] =h^u^ (n-m)  (2.5) 

where  E[*]  denotes  the  expectation  operator.  Remembering  that 
E [A]=E„ (E [a| B] ] , where  the  outer  expectation  E_  is  over  B , 

13  D 

we  may  check  our  result  (of  Lemma  1)  by  computing  E [z (n) u (n-m) ] 
from  E^ [E [z (n) u (n-m) I u (n-m) ] ] . Thus, 


E [z (n) u(n-m) ] = 


E^ [E  tz (n) u (n-m)  | u (n-m) ] ] 


= E [h  u (n-m) ] (from  Lemma  1) 
u m 


= h E [u  (n-m)] 
m u 

= h R (0) 
m uu 


(2.6) 


We  have  already  seen  that  R (m)=E [z (n)u (n-m) ] = h R (0)  for 
■'  zu  m uu 

white  (zero-mean,  independent)  {u(n)}  , so  our  result  checks. 

Henceforth,  we  refer  to  h R (0)  by  R . 

m uu 
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If  we  impose  the  more  restrictive  Assumption  6 instead  of 
6-a,  that  is,  {u(n)}  is  now  an  independent,  binary  sequence, 
the  following  result  is  obtained: 

Lemma  2 : If  {u(n)}  is  an  independent,  binary  sequence,  z(n)u(n-m) 

has  a symmetric  distribution  around  a mean  R(=h  R (0))  . 

m uu 

Proof : From  Lemma  1,  the  distribution  of  [z (n) u (n-m) | u (n-m) ] is 

. 2 
symmetric  around  its  mean  h u (n~m)  . 

m 

since  {u(n)}  is  binary,  u (n)=constant  for  all  n . In 
other  words, 

R (0)  = E[u^(n)] 
uu 

= constant  (2.7) 

2 

= u (n-m)  , in  particular 

That  is,  the  distribution  of  Iz (n) u (n-m) | u (n-m) ] is  symmetric 
around  a mean  (=R)  which  is  independent  of  u(n-m)  , 

the  conditioning  random  variable.  Thus,  our  result  follows. 

Remark:  We  could  as  well  prove  Lemma  2 by  directly  considering 

equation  (2.4) . 

Term  I is  the  sum  of  products  of  pairs  of  independent, 
identically  distributed  random  variables  v/hich  are  zero-mean  and 
symmetric  and  so  is  itself  zero-mean,  symmetrically  distributed. 

Similarly,  Term  II  is  also  a zero-mean,  symmetrically  dis- 
tributed random  variable.  The  sum  of  Terms  I and  II  is  still 

2 

zero  mean,  symmetrically  distributed.  Since  Term  III  h^u  (n-m)  , 

is  a constant  for  binary  {u(n)}  , Lemma  2 follows. 

We  require  one  more  preliminary  result  before  proceeding  to 
the  proof  of  consistency  for  2-1: 


k. 
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Lenuna  3;  For  {u(n)}  independent  and  binary, 

E[g[R-2(n)u(n-m)  ] ] = 0 (2.8) 

Proof ; The  distribution  of  z(n)u(n - in)  is  synunetric  around  its 
mean  R (Lemma  2).  Define  a new  random  variable  Q =R-2 (n) u (n-m) . 
Clearly,  O has  a symmetric  distribution  about  a mean  of  zero. 

Now , 

Elg[R-z (n) u{n-m) ] ] 

= / g[R-z(n)u(n-m)  ] dD^,  (2.9) 

where  is  the  distribution  of  z{n)u(n-m)  . Substituting  Q 

for  [R--Z  (n)  u (n-m)  ] , v/e  can  write 

E [g  [R-z  (n)  u (n--m)  ] ] 

= / g(Q)dD^  (2.10) 

where  Dq  is  the  distribution  of  Q . 

Since  g(*)  is  an  odd,  bounded  continuous  function,  our  desired 
result  follows. 

We  are  now  in  a position  to  state  and  prove  our  main  result, 
which  we  do  in  2.2. 

2 . 2 Proof  of  Consistency 

The  principal  result  of  our  work  may  be  stated  as: 

Theorem 

Given  a linear  system  described  by  equations  (1.1)  and  (1.2) 
under  Assumptions  1-6,  equation  (2.1)  (or  (1.9))  defines  a 
crosscorrelation  estimate  that  is  consistent,  with  convergence 
in  the  mean  square. for  g(')  defined  by  Assumption  7 . 
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In  other  words,  under  the  assumed  conditions. 


R^u(m,n)  51^-  R^^(m) 


where 


m.  s. 


denotes  convergence  in  mean  square. 


Proof:  We  essentially  follow  the  method  of  Robbins  and  Monro  [101 • 
Recalling  equation  (2.1), 

x(n)  = x(n-l)  - A(n)g[x(n-l)-z(n)  u{n-m)]  (2.11) 

n “ 0,1,2,... 
x(-l)  = 0 

oo  oo 

A(n)  = I A(n)  = » ; ^ A^(n)  < oo 

n-1  n=l 

Remembering  that  R (m)  = h R (0)  = R , we  set 

zu  in  uu 

x(n)  - R = [x(n-l)  - R]  - A (n) g (x (n-1) -z (n) u (n-m) ] (2.12) 

Squaring  both  sides, 

[x(n)-R]^  = [x(n-l)-R]^  + A^  (n)  {g  [x  (n-1) -z  (n)  u (n-m]  } ^ 

-2  A(n)  [x(n-l)-R]g[x(n-l)-z(n)u(n-m)  ] (2.13) 

Taking  expectations  w.r.t.  x(n-l), 

E [ {x (n) -R}^ 1 X (n-1)  1 = [x (n-1) -R] ^ 

+ A^ (n) E [ {g (x (n-1) -z (n) u (n-m) ] } 2 Ix(n-l)] 

-2  A(n)  [x (n-1) -R] E [g  [x (n-1) -z  (n)  u (n-m)  1 1 X (n-1)  ] (2.14) 

Set  E[g[x(n-1)  - z (n)  u (n-m)  ] | x (n-1)  ] = 6[x(n-l)] 

There  are  three  cases  to  be  considered: 


(a)  X (n-1) =R;  8[x(n-l)]  = 0 by  Lemma  3. 


(2.15) 
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(b)  x(n-l) >R;  Point  by  point  [x (n-1) -z (n) u (n-m) ] > iR-z (n) u (n-m) ] , 
and  since  g{*)  is  an  odd,  bounded,  continuous,  monotonically 
increasing  function, 

6Ix(n-l)  > 0 . (2.16) 

(c)  X (n-1) <R:  Point  by  point  [x (n-1) -z (n) u (n-m) ] < (R-z (n) u (n-m) ] , 
and  for  reasoning  similar  to  that  in  (b)  above, 

6 [x(n-l) ] < 0 . (2.17) 

2 

Setting  E(x(n)-R]  = B (n)  , and  using  (2.14),  we  obtain 

B(n)=B(n-l)+A^(n)EtE[{g[x(n-l)-z(n)u(n-m)  ] }^|x(n-l)  ] ] 

-2  A(n)E[{x(n-l)-R}B[x(n-l)]]  (2.18) 

Set  E[E[{glx(n-l)-z(n)u(n-m)  ] }"j  x(n-l) ) ] = f(n) 
and  E [ {x (n-1) -R}6 [x (n-1)  1 1 = e(n)  . 


Since  we  are  only  considering  distributions  of  finite  variance, 
and  g(*)  is  odd,  continuous  and  bounded,  it  follows  [10]  that 
0^f(n)«»  . Also,  because  of  (2.15),  (2.16)  and  (2.17), 
e(n)  ^ 0 . 

We  may  re-write  (2.18)  as 

B(n)=B(n-l)  + A^(n)f(n)-2  A(n)e(n)  (2.19) 


Summing  over  (2.19),  we  get 

n-1  _ n-1 

B(n)  = B(l)  + I A^(j)f(j)  - 2 I A(j)  e(j) 
j=l  j=l 


Note  first  that  with  f(n)  ^ 0 , and 

I A^(j)f(j) 

j=l 


I A^(j)  < 00  , 
j = l 

from  [10]. 


(2.20) 


we  have 


< CO 
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Since  B(n)  ^0  , from  (2.20)  we  have 
n-1  00 

I A(j)  e(j)  < 1/2  IB(1)  + I A'‘(j)f(j)]  < oo  (2.21) 

j=l  j=l 

00 

Hence,  the  positive  term  series  J A(j)e(j)  converges. 

j=l 

oo  00 

Thus,  lim  B(n)  = B(l)  + I A (j)f(j)  - 2 I A(j)e(j) 
n-^»  j=l  j=l 


= B exists  and  B ^ 0 . 
Since  A(n)  = 1/n+l  , we  have  from  [10] , 


lim  B(n)=B=0  . 

n->oo 

2 

i.e.  lim  E[x(n)-R]  = 0 

n->oo 

y\ 

or,  R (m,n)  ->■  R (m)  in  mean  square  . 

Zu  Z VI 


Remark  1 ; For  consistency,  we  only  have  to  show  that 

lim  P [ I R^^ (m,n) -R^^ (m) I >e]  = 0 , i.e.  convergence  in  probability. 

n->-oo 

Our  theorem  demonstrates  mean  square  convergence  of  R^^(m,n)  to 
R^^ (m) , and  since  convergence  in  mean  square  implies  convergence 
in  probability,  we  have  established  a stronger  form  of  convergence 
than  is  required  for  consistency. 

Remark  2 ; Some  of  the  results  of  this  section  have  been  reported 
earlier  (Basu  and  VandeLinde  [11]),  and  details  of  all  results 
(of  Section  2)  can  be  found  in  Basu  [12]  . 


Section  3 


CONCLUDING  REMARKS 

The  principal  result  of  this  research  has  been  to  show  the 
consistency,  with  convergence  in  mean  square,  of  the  robust 
estimator  of  crosscorrelation  described  in  [1] . There  are 
several  important  issues,  however,  that  remain  to  be  addressed. 

Clearly,  the  first  obvious  undertaking  would  be  relaxations 
of  Assumptions  6 and  7. 

Specifically , one  might  attempt  to  prove  consistency  under 
the  following  sets  of  assumptions: 

(a)  Assumptions  1-5,  6-a  and  7 

(b)  Assumptions  1 through  7,  with  6-a  replacing  6 

(c)  Assumptions  1 through  6 

(d)  Assumptions  1 through  5 and  6-a 

(e)  Assumptions  1 through  5 alone 

We  have  listed  the  above  sets  (a) -(e)  in  increasing  orders 
of  generality  (of  problem  formulation)  and  of  difficulty  (of 
solution) . We  should  mention,  though,  that  none  of  these  tasks 
are  trivial.  In  fact,  our  conjecture  is  that,  for  these  possible 
relaxations  of  Assumptions  6 and  7,  we  might  have  to  pursue  a 
different  approach  to  prove  consistency.  Recently,  in  1977, 

Ljung  [13]  and  Kushner  [14]  have  proved  convergence  of  certain 
classes  of  recursive  stochastic  algorithms  using  ideas  of 
stability  and  weak  convergence  theory,  respectively.  While  their 
results  are  not  directly  applicable  to  our  case,  their  methods 
might  still  be  of  some  benefit  — at  the  very  least,  they  provide 
two  alternative  avenues  of  approach. 
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In  a more  general  regard,  it  would  be  desirable  to  derive 

asymptotic  properties  of  the  recursive  estimator  of  — 

namely,  does  the  distribution  of  {R  (m,n)}  converge  to 

zu 

some  stable  distribution,  hopefully  normal?  Asymptotic  normality 
would  provide  the  considerable  benefit  of  an  explicit  expression 
for  the  asymptotic  variance  of  the  estimator,  enabling  in  turn 
I a more  analytical  treatment  of  the  robust  identification 

I 

! problem  (see,  for  example,  Appendix  B of  [1]  for  general  ideas 

of  precisely  formulating  a robust  problem) . Our  recursive 
scheme,  unfortunately,  has  an  inherent  lack  of  tractable 
mathematical  structure  for  tackling  such  questions.  We  note 
that  equation  (2.1)  defines  a sequence  {x(n)}  which  is  not 
stationary,  not  Markovian,  nor  does  it  have  any  martingale 
properties.  It  is  small  consolation  to  add  that  all  general, 
stochastic,  recursive  algorithms  operating  on  dependent  data 
suffer  from  similar  drawbacks  and  to  date,  no  results  have 
been  obtained  with  regard  to  asymptotic  distributions  of  such 
schemes . 

Other  questions  of  interest  are  choice  of  d^^  (for 
assumption  7)  — and  choice  of  (S^,S2/d^,d2)  in  the  more 
general  case  — for  improving  rate  of  convergence.  Lastly, 
but  not  least,  is  the  question  of  the  'best'  transformation  to 
use  to  obtain  the  estimates  {a^,b^}  from  the  {h^}  . For 

noisy  {hj^}  , as  is  obviously  the  case  here,  this  is  quite  an 
open  question. 
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